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Abstract 

Terminological knowledge representation systems (TKRSs) are tools for designing and 
using knowledge bases that make use of terminological languages (or concept languages). 
We analyze from a theoretical point of view a TKRS whose capabilities go beyond the 
ones of presently available TKRSs. The new features studied, often required in practical 
applications, can be summarized in three main points. First, we consider a highly expres- 
sive terminological language, called ACCMTZ, including general complements of concepts, 
number restrictions and role conjunction. Second, we allow to express inclusion state- 
ments between general concepts, and terminological cycles as a particular case. Third, we 
prove the decidability of a number of desirable TKRS-deduction services (like satisfiability, 
subsumption and instance checking) through a sound, complete and terminating calculus 
for reasoning in ^£C7V7?.-knowledge bases. Our calculus extends the general technique 
of constraint systems. As a byproduct of the proof, we get also the result that inclusion 
statements in ACCMTZ can be simulated by terminological cycles, if descriptive semantics 
is adopted. 



1. Introduction 

A general characteristic of many proposed terminological knowledge representation systems 
(TKRSs) such as KRYPTON (Brachman, Pigman Gilbert, & Levesque, 1985), NlKL (Kacz- 
marek. Bates, & Robins, 1986), BACK (Quantz & Kindermann, 1990), LOOM (MacGregor & 
Bates, 1987), CLASSIC (Borgida, Brachman, McGuinness, & Alperin Resnick, 1989), KRIS 
(Baader & HoUunder, 1991), K-REP (Mays, Dionne, & Weida, 1991), and others (see Rich, 
editor, 1991; Woods & Schmolze, 1992), is that they are made up of two different compo- 
nents. Informally speaking, the first is a general schema concerning the classes of individuals 
to be represented, their general properties and mutual relationships, while the second is a 
(partial) instantiation of this schema, containing assertions relating either individuals to 
classes, or individuals to each other. This characteristic, which the mentioned proposals 
inherit from the seminal TKRS KL-ONE (Brachman & Schmolze, 1985), is shared also by 
several proposals of database models such as Abrial's (1974), CANDlDE (Beck, Gala, & 
Navathe, 1989), and TAXIS (Mylopoulos, Bernstein, & Wong, 1980). 

Retrieving information in actual knowledge bases (KBs) built up using one of these sys- 
tems is a deductive process involving both the schema (TBox) and its instantiation (ABox). 
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In fact, the TBox is not just a set of constraints on possible ABoxes, but contains intensional 
information about classes. This information is taken into account when answering queries 
to the KB. 

During the realization and use of a KB, a TKRS should provide a mechanical solution 
for at least the following problems (from this point on, we use the word concept to refer to 
a class): 

1. KB-satisfiahility: are an ABox and a TBox consistent with each other? That is, does 
the KB admit a model? A positive answer is useful in the validation phase, while the 
negative answer can be used to make inferences in refutation- style. The latter will be 
precisely the approach taken in this paper. 

2. Concept Satisfiability: given a KB and a concept C, does there exist at least one 
model of the KB assigning a non-empty extension to C? This is important not only 
to rule out meaningless concepts in the KB design phase, but also in processing the 
user's queries, to eliminate parts of a query which cannot contribute to the answer. 

3. Subsumption: given a KB and two concepts C and D, is C more general than D in 
any model of the KB? Subsumption detects implicit dependencies among the concepts 
in the KB. 

4. Instance Checking: given a KB, an individual a and a concept C, is a an instance 
of C in any model of the KB? Note that retrieving all individuals described by a 
given concept (a query in the database lexicon) can be formulated as a set of parallel 
instance checkings. 

The above questions can be precisely characterized once the TKRS is given a semantics 
(see next section), which defines models of the KB and gives a meaning to expressions 
in the KB. Once the problems are formalized, one can start both a theoretical analysis 
of them, and — maybe independently — a search for reasoning procedures accomplishing the 
tasks. Completeness and correctness of procedures can be judged with respect to the formal 
statements of the problems. 

Up to now, all the proposed systems give incomplete procedures for solving the above 
problems 1-4, except for KRIS^. That is, some inferences are missed, in some cases without 
a precise semantical characterization of which ones are. If the designer or the user needs 
(more) complete reasoning, she/he must either write programs in a suitable programming 
language (as in the database proposal of Abrial, and in TAXis), or define appropriate in- 
ference rules completing the inference capabilities of the system (as in BACK, LOOM, and 
classic). From the theoretical point of view, for several systems (e.g., loom) it is not even 
known if complete procedures can ever exist — i.e., the decidability of the corresponding 
problems is not known. 

Recent research on the computational complexity of subsumption had an infiuence in 
many TKRSs on the choice for incomplete procedures. Brachman and Levesque (1984) 

1. Also the system CLASSIC is complete, but only w.r.t. a non-standard semantics for the treatment of 
individuals. Complete reasoning w.r.t. standard semantics for individuals is not provided, and is coNP- 
hard (Lenzerini & Schaerf, 1991). 
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started this research analyzing the complexity of subsumption between pure concept ex- 
pressions, abstracting from KBs (we call this problem later in the paper as pure subsump- 
tion). The motivation for focusing on such a small problem was that pure subsumption is 
a fundamental inference in any TKRS. It turned out that pure subsumption is tractable 
(i.e., worst-case polynomial-time solvable) for simple languages, and intractable for slight 
extensions of such languages, as subsequent research definitely confirmed (Nebel, 1988; 
Donini, Lenzerini, Nardi, & Nutt, 1991a, 1991b; Schmidt-Schaui3 & Smolka, 1991; Donini, 
HoUunder, Lenzerini, Marchetti Spaccamela, Nardi, & Nutt, 1992). Also, beyond compu- 
tational complexity, pure subsumption was proved undecidable in the TKRSs h( (Schild, 
1988), KL-ONE (Schmidt-Schaui3, 1989) and nikl (Patel-Schneider, 1989). 

Note that extending the language results in enhancing its expressiveness, therefore the 
result of that research could be summarized as: The more a TKRS language is expressive, 
the higher is the computational complexity of reasoning in that language — as Levesque 
(1984) first noted. This result has been interpreted in two different ways, leading to two 
different TKRSs design philosophies: 

1. 'General-purpose languages for TKRSs are intractable, or even undecidable, and 
tractable languages are not expressive enough to be of practical interest'. Follow- 
ing this interpretation, in several TKRSs (such as NIKL, LOOM and back) incomplete 
procedures for pure subsumption are considered satisfactory (e.g., see (MacGregor & 
Brill, 1992) for loom). Once completeness is abandoned for this basic subproblem, 
completeness of overall reasoning procedures is not an issue anymore; but other issues 
arise, such as how to compare incomplete procedures (Heinsohn, Kudenko, Nebel, 
& Profitlich, 1992), and how to judge a procedure "complete enough" (MacGregor, 
1991). As a practical tool, inference rules can be used in such systems to achieve the 
expected behavior of the KB w.r.t. the information contained in it. 

2. 'A TKRS is (by definition) general-purpose, hence it must provide tractable and 
complete reasoning to a user'. Following this line, other TKRSs (such as KRYPTON 
and classic) provide limited tractable languages for expressing concepts, following 
the "small-can-be-beautiful" approach (see Patel-Schneider, 1984). The gap between 
what is expressible in the TKRS language and what is needed to be expressed for the 
application is then filled by the user, by a (sort of) programming with inference rules. 
Of course, the usual problems present in program development and debugging arise 
(McGuinness, 1992). 

What is common to both approaches is that a user must cope with incomplete reasoning. 
The difference is that in the former approach, the burden of regaining useful yet missed 
inferences is mostly left to the developers of the TKRS (and the user is supposed to specify 
what is "complete enough"), while in the latter this is mainly left to the user. These 
are perfectly reasonable approaches in a practical context, where incomplete procedures 
and specialized programs are often used to deal with intractable problems. In our opinion 
incomplete procedures are just a provisional answer to the problem — the best possible up to 
now. In order to improve on such an answer, a theoretical analysis of the general problems 
1-4 is to be done. 

Previous theoretical results do not deal with the problems 1-4 in their full generality. 
For example, the problems are studied in (Nebel, 1990, Chapter 4), but only incomplete 
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procedures are given, and cycles are not considered. In (Donini, Lenzerini, Nardi, & Schaerf, 
1993; Schaerf, 1993a) the complexity of instance checking has been analyzed, but only KBs 
without a TBox are treated. Instance checking has also been analyzed in (Vilain, 1991), 
but addressing only that part of the problem which can be performed as parsing. 

In addition, we think that the expressiveness of actual systems should be enhanced 
making terminological cycles (see Nebel, 1990, Chapter 5) available in TKRSs. Such a 
feature is of undoubtable practical interest (MacGregor, 1992), yet most present TKRSs 
can only approximate cycles, by using forward inference rules (as in BACK, CLASSIC, loom). 
In our opinion, in order to make terminological cycles fully available in complete TKRSs, a 
theoretical investigation is still needed. 

Previous theoretical work on cycles was done in (Baader, 1990a, 1990b; Baader, Biirkert, 
HoUunder, Nutt, & Siekmann, 1990; Dionne, Mays, & Oles, 1992, 1993; Nebel, 1990, 1991; 
Schild, 1991), but considering KBs formed by the TBox alone. Moreover, these approaches 
do not deal with number restrictions (except for Nebel, 1990, Section 5.3.5) — a basic feature 
already provided by TKRSs — and the techniques used do not seem easily extensible to 
reasoning with ABoxes. We compare in detail several of these works with ours in Section 4. 

In this paper, we propose a TKRS equipped with a highly expressive language, includ- 
ing constructors often required in practical applications, and prove decidability of problems 
1-4. In particular, our system uses the language ALCMTZ, which supports general comple- 
ments of concepts, number restrictions and role conjunction. Moreover, the system allows 
one to express inclusion statements between general concepts and, as a particular case, 
terminological cycles. We prove decidability by means of a suitable calculus, which is de- 
veloped extending the well established framework of constraint systems (see Donini et al., 
1991a; Schmidt-Schaui3 & Smolka, 1991), thus exploiting a uniform approach to reasoning 
in TKRSs. Moreover, our calculus can easily be turned into a decision procedure. 

The paper is organized as follows. In Section 2 we introduce the language, and we 
give it a Tarski-style extensional semantics, which is the most commonly used. Using this 
semantics, we establish relationships between problems 1-4 which allow us to concentrate 
on KB-satisfiability only. In Section 3 we provide a calculus for KB-satisfiability, and show 
correctness and termination of the calculus. Hence, we conclude that KB-satisfiability is 
decidable in ALCMTZ, which is the main result of this paper. In Section 4 we compare our 
approach with previous results on decidable TKRSs, and we establish the equivalence of 
general (cyclic) inclusion statements and general concept definitions using the descriptive 
semantics. Finally, we discuss in detail several practical issues related to our results in 
Section 5. 

2. Preliminaries 

In this section we first present the basic notions regarding concept languages. Then we 
describe knowledge bases built up using concept languages, and reasoning services that 
must be provided for extracting information from such knowledge bases. 

2.1 Concept Languages 

In concept languages, concepts represent the classes of objects in the domain of interest, 
while roles represent binary relations between objects. Complex concepts and roles can be 
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defined by means of suitable constructors applied to concept names and role names. In 
particular, concepts and roles in ALCMTZ can be formed by means of the following syntax 
(where Pi (for i = 1, . . ., A;) denotes a role name, C and D denote arbitrary concepts, and 
R an arbitrary role): 

C, D — > A I (concept name) 

T I (top concept) 

± I (bottom concept) 

(C n D) I (conjunction) 

{CUD)\ (disjunction) 

-iC I (complement) 

\fR.C I (universal quantification) 

3R.C I (existential quantification) 
(> n R) \ (< n R) (number restrictions) 

R — > Pi n • • • n Pfc (role conjunction) 

When no confusion arises we drop the brackets around conjunctions and disjunctions. 
We interpret concepts as subsets of a domain and roles as binary relations over a domain. 
More precisely, an interpretation X = (A-^, consists of a nonempty set (the domain 
oil) and a function (the extension function oil), which maps every concept to a subset 
of and every role to a subset of X A-^. The interpretation of concept names and 
role names is thus restricted by A^ C A-^, and C A-^ X A-^, respectively. Moreover, 
the interpretation of complex concepts and roles must satisfy the following equations (^{} 
denotes the cardinality of a set): 





= A^ 


±^ 


= 




= c^nD^ 


{CUDf 


= C^UD^ 




= A^\C^ 


(yR.cf 


= {di e A^ 1 yd: 


{3R.Cf 


= {di e A^ 1 3d: 


(> nRf 


= Kga^I t 


(< nRf 


= Kga^I t 


Pi n • • • n Pk f 


= P^ r\---r\Pl 



di,d2) eR^ ^ d2e C^} 
di,d2) G P^Arf2 G C^} 
^{d2\{di,d2) e R^} >n} 
^{d2\{di,d2) e R^} <n} 



2.2 Knowledge Bases 

A knowledge base built by means of concept languages is generally formed by two compo- 
nents: The intensional one, called TBox, and the extensional one, called ABox. 

We first turn our attention to the TBox. As we said before, the intensional level spec- 
ifies the properties of the concepts of interest in a particular application. Syntactically, 
such properties are expressed in terms of what we call inclusion statements. An inclusion 
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statement (or simply inclusion) has the form 

C 

where C and D are two arbitrary yl/^CA/"?^- concepts. Intuitively, the statement specifies 
that every instance of C is also an instance of _D. More precisely, an interpretation I satisfies 
the inclusion C Q D if C . 

A TBox is a finite set of inclusions. An interpretation I is a model for a TBox T if I 
satisfies all inclusions in T. 

In general, TKRSs provide the user with mechanisms for stating concept introductions 
(e.g., Nebel, 1990, Section 3.2) of the form A = D (concept definition, interpreted as set 
equality), or A < D (concept specification, interpreted as set inclusion), with the restrictions 
that the left-hand side concept A must be a concept name, that for each concept name 
at most one introduction is allowed, and that no terminological cycles are allowed, i.e., 
no concept name may occur — neither directly nor indirectly — within its own introduction. 
These restrictions make it possible to substitute an occurrence of a defined concept by its 
definition. 

We do not impose any of these restrictions to the form of inclusions, obtaining statements 
that are syntactically more expressive than concept introductions. In particular, a definition 
of the form A = D can be expressed in our system using the pair of inclusions A \Z D 
and D \Z A and a specification of the form A < D can be simply expressed hj A \Z D. 
Conversely, an inclusion of the form CCD, where C and D are arbitrary concepts, cannot 
be expressed with concept introductions. Moreover, cyclic inclusions are allowed in our 
statements, realizing terminological cycles. 

As shown in (Nebel, 1991), there are at least three types of semantics for terminolog- 
ical cycles, namely the least fixpoint, the greatest fixpoint, and the descriptive semantics. 
Fixpoint semantics choose particular models among the set of interpretations that satisfy a 
statement of the form A = D. Such models are chosen as the least and the greatest fixpoint 
of the above equation. The descriptive semantics instead considers all interpretations that 
satisfy the statement (i.e., all fixpoints) as its models. 

However, fixpoint semantics naturally apply only to fixpoint statements like A = D, 
where _D is a "function" of A, i.e., A may appear in D, and there is no obvious way to 
extend them to general inclusions. In addition, since our language includes the constructor 
for complement of general concepts, the "function" D may be not monotone, and therefore 
the least and the greatest fixpoints may be not unique. Whether there exists or not a 
definitional semantics that is suitable for cyclic definitions in expressive languages is still 
unclear. 

Conversely, the descriptive semantics interprets statements as just restricting the set of 
possible models, with no definitional import. Although it is not completely satisfactory in all 
practical cases (Baader, 1990b; Nebel, 1991), the descriptive semantics has been considered 
to be the most appropriate one for general cyclic statements in powerful concept languages. 
Hence, it seems to be the most suitable to be extended to our case and it is exactly the one 
we have adopted above. 

Observe that our decision to put general inclusions in the TBox is not a standard one. In 
fact, in TKRS like KRYPTON such statements were put in the ABox. However, we conceive 
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inclusions as a generalization of traditional TBox statements: acyclic concept introductions, 
with their definitional import, can be perfectly expressed with inclusions; and cyclic concept 
introductions can be expressed as well, if descriptive semantics is adopted. Therefore, we 
believe that inclusions should be part of the TBox. 

Notice that role conjunction allows one to express the practical feature of suhroles. For 
example, the role ADOPTEDCHILD can be written as CHILD nADOPTEDCHILD', where ADOPTED- 
CHILD' is a role name, making it a subrole of CHILD. Following such idea, every hierarchy 
of role names can be rephrased with a set of role conjunctions, and vice versa. 

Actual systems usually provide for the construction of hierarchies of roles by means of 
role introductions (i.e., statements of the form P = R and P < R) m the TBox. However, 
in our simple language for roles, cyclic definitions of roles can be always reduced to acyclic 
definitions, as explained in (Nebel, 1990, Sec. 5. 3.1). When role definitions are acyclic, one 
can always substitute in every concept each role name with its definition, obtaining an 
equivalent concept. Therefore, we do not consider role definitions in this paper, and we 
conceive the TBox just as a set of concept inclusions. 

Even so, it is worth to notice that concept inclusions can express knowledge about roles. 
In particular, domain and range restrictions of roles can be expressed, in a way similar to 
the one in (Catarci & Lenzerini, 1993). Restricting the domain of a role i? to a concept C 
and its range to a concept D can be done by the two inclusions 

3i2.T C C, T \Z\fR.D 

It is straightforward to show that if an interpretation I satisfies the two inclusions, then 
R^ CC^ X D^. 

Combining subroles with domain and range restrictions it is also possible to partially 
express the constructor for role restriction, which is present in various proposals (e.g., 
the language in Brachman & Levesque, 1984). Role restriction, written as i? : C, is 
defined by {R: C f = {(^1,^2) G X | (^1,^2) G -R^ A ^2 G C^}- For example the 
role DAUGHTER, which can be formulated as CHILD : Female, can be partially simulated by 
CHILD n DAUGHTER', with the inclusion T C VDAUGHTER'. Female. However, this simulation 
would not be complete in number restrictions: E.g., if a mother has at least three daughters, 
then we know she has at least three female children; if instead we know that she has three 
female children we cannot infer that she has three daughters. 

We can now turn our attention to the extensional level, i.e., the ABox. The ABox 
essentially allows one to specify instance-of relations between individuals and concepts, and 
between pairs of individuals and roles. 

Let O be an alphabet of symbols, called individuals. Instance-of relationships are ex- 
pressed in terms of membership assertions of the form: 

C(a), R(a,b), 

where a and b are individuals, C is an yl/^CA/"?^- concept, and R is an ACCMTZ-role. Intu- 
itively, the first form states that a is an instance of C, whereas the second form states that 
a is related to b by means of the role R. 
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In order to assign a meaning to membership assertions, the extension function • of an 
interpretation I is extended to individuals by mapping them to elements of A-^ in such a 
way that ^ \i a ^ b. This property is called Unique Name Assumption; it ensures 
that different individuals are interpreted as different objects. 

An interpretation J satisfies the assertion C(a) if a^ G , and satisfies R{a, h) if 
(a-^, 5-^) G B?" ■ An ABox is a finite set of membership assertions. I is a model for an ABox 
Alii satisfies all the assertions in A. 

An ALCMTZ-knowledge base S is a pair S = {T,A) where T is a TBox and A is an 
ABox. An interpretation I is a model for S if it is both a model for T and a model for A. 

We can now formally define the problems 1-4 mentioned in the introduction. Let S be 
an yl£CA/'7^-knowledge base. 

1. KB- satisfiability : S is satisfiable, if it has a model; 

2. Concept Satisfiability : C is satisfiable w.r.t S, if there exists a model I of S such that 

3. Subsumption : C is subsumed by _D w.r.t. S, if C"^ C for every model J of S; 

4. Instance Checking : a is an instance of C, written S |= C(a), if the assertion C(a) is 
satisfied in every model of S. 

In (Nebel, 1990, Sec. 3. 3. 2) it is shown that the ABox plays no active role when checking 
concept satisfiability and subsumption. In particular, Nebel shows that the ABox (subject 
to its satisfiability) can be replaced by an empty one without affecting the result of those 
services. Actually, in (Nebel, 1990), the above property is stated for a language less expres- 
sive than ACCMTZ. However, it is easy to show that it extends to ACCMTZ. It is important 
to remark that such a property is not valid for all concept languages. In fact, there are 
languages that include some constructors that refer to the individuals in the concept lan- 
guage, e.g., the constructor ONE-OF (Borgida et al., 1989) that forms a concept from a set of 
enumerated individuals. If a concept language includes such a constructor the individuals 
in the TBox can interact with the individuals in the ABox, as shown in (Schaerf, 1993b). 
As a consequence, both concept satisfiability and subsumption depend also on the ABox. 

Example 2.1 Consider the following knowledge base S = {T,A): 

T = {BTEACHES.Course C (Student n 3DEGREE.BS) U Prof, 
Prof C 3DEGREE.MS, 
3DEGREE.MS C 3DEGREE.BS, 

MS n BS C 1} 

A = {TEACHES(john, CS156), (< 1 DEGREE)(john), Course(csl56)} 

S is a fragment of a hypothetical knowledge base describing the organization of a university. 
The first inclusion, for instance, states that the persons teaching a course are either graduate 
students (students with a BS degree) or professors. It is easy to see that S is satisfiable. For 
example, the following interpretation I satisfies all the inclusions in T and all the assertions 
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in A, and therefore it is a model for S: 

= {John, csl56, csb}, john-^ = John, csl56-^ = csl56 
Student^ = {John}, Prof^ = 0, Course^ = {csl56}, BS^ = {csb} 
MS^ = 0, TEACHES^ = {(John, csl56)}, DEGREE^ = {(John, csb)} 

We have described the interpretation J by giving only A-^, and the values of J on 
concept names and role names. It is straightforward to see that all values of I on complex 
concepts and roles are uniquely determined by imposing that I must satisfy the Equations 1 
on page 113. 

Notice that it is possible to draw several non-trivial conclusions from S. For example, we 
can infer that S |= Student(john). Intuitively this can be shown as follows: John teaches 
a course, thus he is either a student with a BS or a professor. But he can't be a professor 
since professors have at least two degrees (BS and MS) and he has at most one, therefore 
he is a student. □ 

Given the previous semantics, the problems 1-4 can all be reduced to KB-satisfiability 
(or to its complement) in linear time. In fact, given a knowledge base S = {T,A), two 
concepts C and D, an individual a, and an individual b not appearing in S, the following 
equivalences hold: 

C is satisfiable w.r.t S iff {T,A U {C(b)}) is satisfiable. 
C is subsumed by D w.r.t. S iff {T,A U {(C □ -i_D)(5)}) is not satisfiable. 

S 1= C(a) iff {T,AU {(-C)(a)}) is not satisfiable. 

A slightly different form of these equivalences has been given in (HoUunder, 1990). The 
equivalences given here are a straightforward consequence of the ones given by HoUunder. 
However, the above equivalences are not valid for languages including constructors that refer 
to the individuals in the concept language. The equivalences between reasoning services in 
such languages are studied in (Schaerf, 1993b). 

Based on the above equivalences, in the next section we concentrate just on KB- 
satisfiability. 

3. Decidability Result 

In this section we provide a calculus for deciding KB-satisfiability. In particular, in Subsec- 
tion 3.1 we present the calculus and we state its correctness. Then, in Subsection 3.2, we 
prove the termination of the calculus. This will be sufficient to assess the decidability of all 
problems 1-4, thanks to the relationships between the four problems. 

3.1 The calculus and its correctness 

Our method makes use of the notion of constraint system (Donini et al., 1991a; Schmidt- 
Schaui3 & Smolka, 1991; Donini, Lenzerini, Nardi, & Schaerf, 1991c), and is based on a 
tableaux-like calculus (Fitting, 1990) that tries to build a model for the logical formula 
corresponding to a KB. 



117 



BUCHHEIT, DONINI, & SCHAERF 



We introduce an alphabet of variable symbols V together with a well-founded total 
ordering on V. The alphabet V is disjoint from the other ones defined so far. The 
purpose of the ordering will become clear later. The elements of V are denoted by the 
letters x, y, z, w. From this point on, we use the term object as an abstraction for individual 
and variable (i.e., an object is an element of (9 U V). Objects are denoted by the symbols 
s,t and, as in Section 2, individuals are denoted by a,b. 

A constraint is a syntactic entity of one of the forms: 

s:C, sPt, \fx.x:C, s t, 

where C is a concept and P is a role name. Concepts are assumed to be simple, i.e., the 
only complements they contain are of the form ^A, where A is a concept name. Arbitrary 
yl/^CA/"?^- concepts can be rewritten into equivalent simple concepts in linear time (Donini 
et al., 1991a). A constraint system is a finite nonempty set of constraints. 

Given an interpretation I, we define an J-assignment a as a function that maps every 
variable of V to an element of A-^, and every individual a to (i-e., a(a) = for all 
a e O). 

A pair (I, a) satisfies the constraint s: C if a[s) G , the constraint sPt if {ct[s), ct[t)) 
G P"^, the constraint s ^ t if a(s) ^ ct(t), and finally, the constraint \lx.x:C if = 
(notice that a does not play any role in this case). A constraint system S is satisfiable if 
there is a pair (I, a) that satisfies every constraint in S. 

An yl£CA/'7^-knowledge base S = (T, A) can be translated into a constraint system 
Sj: by replacing every inclusion C ^ D £ T with the constraint \fx.x: -iC U D, every 
membership assertion C(a) with the constraint a: C, every R(a,b) with the constraints 
aPib, . . . , aPkb if i? = Pi fl . . . fl P^, and including the constraint a b for every pair (a, b) 
of individuals appearing in A. It is easy to see that S is satisfiable if and only if Sj^ is 
satisfiable. 

In order to check a constraint system S for satisfiability, our technique adds constraints 
to S until either an evident contradiction is generated or an interpretation satisfying it can 
be obtained from the resulting system. Constraints are added on the basis of a suitable set 
of so-called propagation rules. 

Before providing the rules, we need some additional definitions. Let 5 be a constraint 
system and P = Pi fl . . . fl P^ (A; > 1) be a role. We say that t is an R-successor of s in S 
if sPi^, . . . , sPkt are in S . We say that i is a direct successor of s in S if for some role P, 
t is an P-successor of s. We call direct predecessor the inverse relation of direct successor. 
If S is clear from the context we omit it. Moreover, we denote by successor the transitive 
closure of the relation direct successor, and we denote by predecessor its inverse. 

We assume that variables are introduced in a constraint system according to the ordering 
This means, if y is introduced in a constraint system S then x ~< y for all variables x 
that are already in S . 

We denote by S[x/s] the constraint system obtained from S by replacing each occurrence 
of the variable x by the object s. 

We say that s and t are separated in S if the constraint s 7^ i is in S . 

Given a constraint system S and an object s, we define the function a(-, •) as follows: 
a(S,s) := {C I s: C G S}. Moreover, we say that two variables x and y are S-equivalent, 
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written x =s y, if (j(S,x) = a(S,y). Intuitively, two S-equivalent variables can represent the 
same element in the potential interpretation built by the rules, unless they are separated. 

The propagation rules are: 

1. S {s:C\, s:C2}US 

if 1. s: C'l n C2 is in S , 

2. s: C'l and s: C2 are not both in S 

2. S {s:D}US 

if 1. s:C\ U C2 is in S, 

2. neither s: C'l nor s: C2 is in S , 

3. D = C'l or D = C2 

3. S {t:C}US 

if 1. s: MR.C is in 5, 

2. i is an i?-successor of s, 

3. t: C is not in S 

4. 5 ^3 {sPiy,...,sPky, y:C}^S 

if 1. s: 3i?.C is in 5, 

2. R = PiU ...nPk, 

3. y is a new variable, 

4. there is no i such that t is an i?-successor of s in 5 and t: C is in 5, 

5. if s is a variable there is no variable w such that w < s and s =s w 

b. S -ry {sPiy,,. . .,sPky^\ i el..n}yj {y, ^ yj el..n,i ^ j}yj S 
if 1. s: (> ra i?) is in 5, 

2. i2 = Pi n ...nPfc, 

3. yi, . . . ,yn are new variables, 

4. there do not exist n pairwise separated i?-successors of s in S , 

5. if s is a variable there is no variable w such that w ~< s and s =s w 

6. 5 ^< 

if 1. s: (< ra R) is in 5, 

2. s has more than ra i?-successors in S , 

3. are two i?-successors of s which are not separated 

7. S {s:C}US 

if 1. \fx.x: C is in 5, 

2. s appears in 5, 

3. s: C is not in S . 

We call the rules and ^< nondeterministic rules, since they can be applied in 
different ways to the same constraint system (intuitively, they correspond to branching 
rules of tableaux). All the other rules are called deterministic rules. Moreover, we call the 
rules ^3 and ^> generating rules, since they introduce new variables in the constraint 
system. All other rules are called nongenerating ones. 
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The use of the condition based on the ^-equivalence relation in the generating rules 
(condition 5) is related to the goal of keeping the constraint system finite even in presence 
of potentially infinite chains of applications of generating rules. Its role will become clearer 
later in the paper. 

One can verify that rules are always applied to a system S either because of the presence 
in 5 of a given constraint s: C (condition 1), or, in the case of the -^s/^-^nle, because of the 
presence of an object s in 5. When no confusion arises, we will say that a rule is applied 
to the constraint s: C or the object s (instead of saying that it is applied to the constraint 
system S). 

Proposition 3.1 (Invariance) Let S and S' be constraint systems. Then: 

1. If S' is obtained from S by application of a deterministic rule, then S is satisfiable if 
and only if S' is satisfiable. 

2. If S' is obtained from S by application of a nondeterministic rule, then S is satisfi- 
able if S' is satisfiable. Conversely, if S is satisfiable and a nondeterministic rule is 
applicable to an object s in S , then it can be applied to s in such a way that it yields 
a satisfiable constraint system. 

Proof. The proof is mainly a rephrasing of typical soundness proofs for tableaux meth- 
ods (e.g.. Fitting, 1990, Lemma 6.3.2). The only non-standard constructors are number 
restrictions. 

1. "<^" Considering the deterministic rules one can directly check that 5 is a subset of S'. 
So it is obvious that S is satisfiable if S' is satisfiable. 

"=^" In order to show that S' is satisfiable if this is the case for S we consider in turn 
each possible deterministic rule application leading from S to S'. We assume that (I, a) 
satisfies S . 

If the ^n-rule is applied to s: C'l □ C2 in S , then S' = S U {s: C'l, s: C2}. Since (I, a) 
satisfies s: C'l □ C2, (I, a) satisfies s: C'l and s: C2 and therefore S'. 

If the ^v-rule is applied to s:\fR.C, there must be an i?-successor t of s m S such that 
S' = SU{t:C}. Since {T,a) satisfies 5, it holds that {a{s),a{t)) G i?"^. Since {T,a) satisfies 
s:Vi?.C, it holds that a{t) G ■ So {T^a) satisfies t:C and therefore S' . 

If the ^Vaj-rule is applied to an s because of the presence of \lx.x:C in S, then S' = 
S U {s:C}. Since {T^a) satisfies S it holds that = . Therefore a(s) G and so 
(I, a) satisfies S' . 

If the ^3-rule is applied to s: 3R.C , then S' = S U {sPiy, . . . , sP^y, y: C}. Since (I, a) 
satisfies 5, there exists a d such that («(«), d) G and d G ■ We define the I-assignment 
a' as a'(y) := d and a'(t) := a(t) for t y. It is easy to show that (J, a') satisfies S'. 

If the ^>-rule is applied to s: (> nR), then S' = S U {sPiyi, . . .,sPkyi \ i G l..n} U 
\^y- yj I i,j G l..n,i j}. Since (I, a) satisfies S, there exist n distinct elements 
c?!, . . .^dn G A-^ such that {a{.s)^di) G R?" ■ We define the I-assignment a' as a'(yi) := di 
for i G l..n and a'(t) := a(t) for t ^ {yi, . . . , It is easy to show that (I, a') satisfies S'. 

2. "<^" Assume that S' is satisfied by (J, a'). We show that S is also satisfiable. If S' 
is obtained from S by application of the ^u-rule, then 5 is a subset of S' and therefore 
satisfied by (J, a'). 
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If S' is obtained from S by application of the ^<-rule to s: (< n R) in S, then there 
are y,t m S such that S' = S[y/t]. We define the I-assignment a as a(y) := a'(t) and 
a(v) := a'(v) for every object f with v y. Obviously (I, a) satisfies S. 

"=^" Now suppose that S is satisfied by (I, a) and a nondeterministic rule is applicable 
to an object s. 

If the ^u-rule is applicable to s: Ci U C2 then, since S is satisfiable, a[s) G (Ci U €2)^ ■ 
It follows that either a(s) G Cf or Q!(s) G (or both). Hence, the ^u-rule can obviously 
be applied in a way such that (I, a) satisfies the resulting constraint system S'. 

If the ^<-rule is applicable to s: (< nR), then — since (I, a) satisfies S — it holds that 
ct{s) G (< n R)^ and therefore the set {d G \ [a[s),d) G R^} has at most n elements. 
On the other hand, there are more than n i?-successors of s in 5 and for each i?-successor t 
of s we have (a(s), a (t j) G R^. Thus, we can conclude by the Pigeonhole Principle (see e.g., 
Lewis & Papadimitriou, 1981, page 26) that there exist at least two i?-successors t,t' of s 
such that a(t) = a(t'). Since (I, a) satisfies S, the constraint t t' is not in S. Therefore 
one of the two must be a variable, let's say t' = y. Now obviously (I, a) satisfies S[y/t]. □ 

Given a constraint system S , more than one rule might be applicable to it. We define 
the following strategy for the application of rules: 

1. apply a rule to a variable only if no rule is applicable to individuals; 

2. apply a rule to a variable x only if no rule is applicable to a variable y such that y ~< x; 

3. apply generating rules only if no nongenerating rule is applicable. 

The above strategy ensures that the variables are processed one at a time according to 
the ordering 

From this point on, we assume that rules are always applied according to this strategy 
and that we always start with a constraint system Sj^ coming from an yl£CA/'7^-knowledge 
base S. The following lemma is a direct consequence of these assumptions. 

Lemma 3.2 (Stability) Let S be a constraint system and x be a variable in S . Let a 
generating rule be applicable to x according to the strategy. Let S' be any constraint system 
derivable from S by any sequence (possibly empty) of applications of rules. Then 

L No rule is applicable in S' to a variable y with y ~< x 

2. (j{S,x) = a(S',x) 

3. If y is a variable in S with y ~< x then y is a variable in S' , i.e., the variable y is not 
substituted by another variable or by a constant. 

Proof. L By contradiction: Suppose S = Sq —^^ Si —^^ ■ ■ ■ ^* Sn = S', where * G 
{U, n, 3, V, >, <, Va;} and a rule is applicable to a variable y such that y ~< x m S'. Then 
there exists a minimal i, where i < n, such that this is the case in Si. Note that i 7^ 0; in 
fact, because of the strategy, if a rule is applicable to a; in 5 no rule is applicable to y in S . 
So no rule is applicable to any variable z such that z ~< x m Sq, . . . , Si-i. It follows that 
from Si-i to Si a rule is applied to a; or to a variable w such that x ~< w. By an exhaustive 
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analysis of all rules we see that — whichever is the rule applied from Si-i to Si — no new 
constraint of the form y: C or yRz can be added to Si-i, and therefore no rule is applicable 
to y in Si, contradicting the assumption. 

2. By contradiction: Suppose a(S,x) a(S',x). Call y the direct predecessor of x, then a 
rule must have been applied either to y or to a; itself. Obviously we have y ~< x, therefore 
the former case cannot be because of point 1. A case analysis shows that the only rules 
which can have been applied to x are generating ones and the and the ^< rules. But 
these rules add new constraints only to the direct successors of x and not to x itself and 
therefore do not change a(-,x). 

3. This follows from point 1. and the strategy. □ 

Lemma 3.2 proves that for a variable x which has a direct successor, a{-,x) is stable, 
i.e., it will not change because of subsequent applications of rules. In fact, if a variable 
has a direct successor it means that a generating rule has been applied to it, therefore 
(Lemma 3.2.2) from that point on a{-,x) does not change. 

A constraint system is complete if no propagation rule applies to it. A complete system 
derived from a system S is also called a completion of 5. A clash is a constraint system 
having one of the following forms: 

• {s: A, s: ^A}, where A is a concept name. 

• {s: (< nR)} U {sPit,, . ..,sPkt, \ i G + 1} 

U {tt ^ tj \ij e l..n + 1,1! 7^ i}, 
where R = PiH . . .H Pk. 

A clash is evidently an unsatisfiable constraint system. For example, the last case 
represents the situation in which an object has an at-most restriction and a set of R- 
successors that cannot be identified (either because they are individuals or because they 
have been created by some at-least restrictions). 

Any constraint system containing a clash is obviously unsatisfiable. The purpose of the 
calculus is to generate completions, and look for the presence of clashes inside. If 5 is a 
completion of Sj^ and S contains no clash, we prove that it is always possible to construct 
a model for S on the basis of S . Before looking at the technical details of the proof, let us 
consider an example of application of the calculus for checking satisfiability. 

Example 3.3 Consider the following knowledge base S = {T,A): 

T = {Italian C 3FRIEID. Italian} 

A = {FRIEID(peter, susan), 

VFRIEID.-iItalian(peter), 
3FRIEID.Italian(susan)} 

The corresponding constraint system Sj^ is: 

5s = {Vs. a;: -.Italian U 3FRIEID. Italian, 
peterFRIENDsusan, 
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peter: VFRIEID. -.Italian, 
susan: 3FRIEID. Italian 
peter ^ susan} 

A sequence of applications of the propagation rules to Sj^ is as follows: 



Si = 


U {susan: -iltalian} (^y-rule) 




S2 = 


5*1 U {peter: -.Italian U 3FRIEID. Italian} ( 


^Vs^-rule) 


83 = 


5*2 U {susan: -.Italian U 3FRIEID. Italian} ( 


^Vs^-rule) 


S4 = 


5*3 U {peter: -.Italian} (^u-rule) 




Ss = 


5*4 U {susanFRIENDx, x: Italian} (^3-rule) 




Se = 


5*5 U {x: -.Italian U 3FRIEID. Italian} (^s/x- 


-rule) 


87 = 


Se U {x: BFRIEID.Italian} (^y-rule) 




Sg = 


5*7 U {xFRIEIDy,y: Italian} (^3-rule) 




Sg = 


5*8 U {y: -.Italian U 3FRIEID. Italian} (^s/x- 


-rule) 


Sw = 


-- Sg U {y: BFRIEID.Italian} (^y-rule) 





One can verify that 5*10 is a complete clash-free constraint system. In particular, the ^3- 
rule is not applicable to y. In fact, since x =3^0 y condition 5 is not satisfied. From 5*10 one 
can build an interpretation I, as follows (again, we give only the interpretation of concept 
and role names): 

= {peter, susan, X, y} 
peter-^ = peter, susan-^ = susan, a[x) = x, a{y) = y, 
Italian-^ = {x,y} 

FRIEND-^ = {(peter, susan), (susan, x), (x, y), (y, y)} 

It is easy to see that I is indeed a model for S. □ 

In order to prove that it is always possible to obtain an interpretation from a complete 
clash-free constraint system we need some additional notions. Let 5 be a constraint system 
and X, w variables in S. We call w a witness of x in S if the three following conditions hold: 

1. X =s w 

2. w ~< X 

3. there is no variable z such that z ~< w and z satisfies conditions 1. and 2., i.e., w is 
the least variable w.r.t. -< satisfying conditions 1. and 2. 

We say x is blocked (by w) in 5 if a; has a witness (w) in S . The following lemma states a 
property of witnesses. 

Lemma 3.4 Let S be a constraint system, x a variable in S . If x is blocked then 

1. X has no direct successor and 

2. X has exactly one witness. 
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Proof. 1. By contradiction: Suppose that x is blocked in S and xPy is in S. During the 
completion process leading to 5 a generating rule must have been applied to a; in a system 
S' . It follows from the definition of the rules that in S' for every variable w < x we had 
x^^iw. Now from Lemma 3.2 we know, that for the constraint system S derivable from 
S' and for every w < x m S we also have x^^w. Hence there is no witness for x in 5, 
contradicting the hypothesis that x is blocked. 

2. This follows directly from condition 3. for a witness. □ 

As a consequence of Lemma 3.4, in a constraint system 5, if wi is a witness of x then wi 
cannot have a witness itself, since both the relations and ^-equivalence are transitive. 
The uniqueness of the witness for a blocked variable is important for defining the following 
particular interpretation out of S . 

Let 5 be a constraint system. We define the canonical interpretation Ig and the canon- 
ical J s -assignment as as follows: 

1. A'^-s' := {s I s is an object in S} 

2. as(s) := s 

3. s G A^'^ if and only if s: A is in 5 

4. (s,t) e P^'^ if and only if 

(a) sPt is in S or 

(b) s is a blocked variable, w is the witness of s in 5 and wPt is in S . 

We call a P-role-pair of s in I5 if (s^t) G P"^-^, we call (s^t) a role-pair of s in I5 

if (s, t) is a P-role-pair for some role P. We call a role-pair explicit if it comes up from case 
4. (a) of the definition of the canonical interpretation and we call it implicit if it comes up 
from case 4.(b). 

From Lemma 3.4 it is obvious that a role-pair cannot be both explicit and implicit. 
Moreover, if a variable has an implicit role-pair then all its role-pairs are implicit and they 
all come from exactly one witness, as stated by the following lemma. 

Lemma 3.5 Let S be a completion and x a variable in S . Let Ig be the canonical inter- 
pretation for S . If X has an implicit role-pair (x,y), then 

1 . all role-pairs of x in Ig (ire implicit 

2. there is exactly one witness w of x in S such that for all roles P in S and all P-role- 
pairs (x,y) of x, the constraint wPy is in S. 

Proof. The first statement follows from Lemma 3.4 (point 1 ). The second statement follows 
from Lemma 3.4 (point 2) together with the definition of I5. □ 

We have now all the machinery needed to prove the main theorem of this subsection. 

Theorem 3.6 Let S be a complete constraint system. If S contains no clash then it is 
satisfiable. 
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Proof. Let I5 and as be the canonical interpretation and canonical I-assignment for S . 
We prove that the pair (JstC(s) satisfies every constraint c in S. If c has the form sPt or 
s t, then (I5, as) satisfies them by definition of I5 and as- Considering the ^>-rule and 
the ^<-rule we see that a constraint of the form s s can not be in 5. If c has the form 
s: C, we show by induction on the structure of C that s G C^'^ . 

We first consider the base cases. If C is a concept name, then s G C^'^ by definition 
of I5. If C = T, then obviously s G T^'^ . The case that C = ± cannot occur since S is 
clash-free. 

Next we analyze in turn each possible complex concept C. If C is of the form -iCi then 
C'l is a concept name since all concepts are simple. Then the constraint s: C'l is not in S 
since S is clash-free. Then s ^ Cf"^, that is, s G A-^-^' \ Cf-^. Hence s G (-■Ci)'^-^. 

If C is of the form C'l □ C2 then (since S is complete) s: C'l is in S and s: C2 is in S . By 
induction hypothesis, s G C^"^ and s G C^"^. Hence s G (Ci □ C2)'^-^. 

If C is of the form C'l U C2 then (since S is complete) either s: C'l is in 5 or s: C2 is in 
5. By induction hypothesis, either s G Cf"^ or s G Cj"^. Hence s G (Ci U C2)'^-^. 

If C is of the form MR.D, we have to show that for all t with {s,t) G i?"^-^ it holds that 
t G -D"^-^. If (s,i) G i?"^-^, then according to Lemma 3.5 two cases can occur. Either t is an 
i?-successor of s in 5 or s is blocked by a witness w m S and t is an i?-successor of w in S . 
In the first case t: D must also be in S since S is complete. Then by induction hypothesis 
we have t G D^'^ . In the second case by definition of witness, w.MR.D is in S and then 
because of completeness of 5, t: D must be in S . By induction hypothesis we have again 
t G D'^s_ 

If C is of the form 3R.D we have to show that there exists a i G A^'^ with (s,i) G R^'^ 
and i G D^'^ . Since 5 is complete, either there is a i that is an i?-successor of s in 5 and 
t: D is in S , or s is a variable blocked by a witness w in S . In the first case, by induction 
hypothesis and the definition of I5, we have t G D^s and (s,t) G R^"" . In the second case 
w: 3R.D is in S . Since w cannot be blocked and S is complete, we have that there is a 
t that is an i?-successor of w in 5 and t: D is in S . So by induction hypothesis we have 
t G D^'^ and by the definition of I5 we have (s,i) G R^'^ . 

If C is of the form (< n R) we show the goal by contradiction. Assume that s ^ (< 
nR)^'^. Then there exist atleast n + 1 distinct objects ti,. . .,tn+i with {s,ti) G R^'^ , i G 
l..n + 1. This means that, since i? = Pi fl . . . fl P^, there are pairs (s^ti) G -Pf"^, where 
i G 1 and j G Then according to Lemma 3.5 one of the two following cases must 

occur. Either all sPjti for j G l..k, i G l..n + 1 are in S or there exists a witness w of s in 
S with all wPiti for j G l..k and i G L.ra -|- 1 are in S . In the first case the ^<-rule can not 
be applicable because of completeness. This means that all the ij's are pairwise separated, 
i.e., that S contains the constraints ti ^ tj, i,j G l..n+ 1,« 7^ j- This contradicts the fact 
that S is clash-free. And the second case leads to an analogous contradiction. 

If C is of the form (> n R) we show the goal by contradiction. Assume that s ^ (> 
nKf-'^. Then there exist atmost m < n (m possibly 0) distinct objects ii,...,^^ with 
{s,ti) G R^'^ , i G l..m. We have to consider two cases. First case: s is not blocked in 
S. Since there are only m P-successors of s in S, the ^>-rule is applicable to s. This 
contradicts the fact that S is complete. Second case: s is blocked by a witness w in S . 
Since there are m P-successors of w in S , the ^>-rule is applicable to w. But this leads to 
the same contradiction. 
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If c has the form \fx.x:D then, since S is complete, for each object t in S, t: D is in 
S — and, by the previous cases, t G D^'^ . Therefore, the pair [Js,<^s) satisfies \lx.x:D. 
Finally, since {Ig^Cis) satisfies all constraints in S, {Is^Cis) satisfies S. □ 

Theorem 3.7 (Correctness) A constraint system S is satisfiable if and only if there exists 
at least one clash-free completion of S . 

Proof. "<^" Follows immediately from Theorem 3.6. "=^" Clearly, a system containing 
a clash is unsatisfiable. If every completion of S is unsatisfiable, then from Proposition 3.1 
S , is unsatisfiable. □ 

3.2 Termination and complexity of the calculus 

Given a constraint system S , we call ns the number of concepts appearing in S , including 
also all the concepts appearing as a substring of another concept. Notice that ns is bounded 
by the length of the string expressing S . 

Lemma 3.8 Let S be a constraint system and let S' be derived from S by means of the 
propagation rules. In any set of variables in S' including more than 1"'^ variables there are 
at least two variables x,y such that x =gt y. 

Proof. Each constraint x:C(z S' may contain only concepts of the constraint system S. 
Since there are ns such concepts, given a variable x there cannot be more than different 
sets of constraints x: C in S' . □ 

Lemma 3.9 Let S be a constraint system and let S' be any constraint system derived from 
S by applying the propagation rules with the given strategy. Then, in S' there are at most 
non-blocked variables. 

Proof. Suppose there are 2"^^ + 1 non-blocked variables. From Lemma 3.8, we know that 
in S' there are at least two variables yi, y2 such that yi =s y2- Obviously either yi -< y2 or 
y2 ~< yi holds; suppose that yi -< y2. From the definitions of witness and blocked either yi 
is a witness of y2 or there exists a variable such that -< yi and is a witness of y2. 
In both cases y2 is blocked, contradicting the hypothesis. □ 

Theorem 3.10 (Termination and space complexity) LetT, be an ALCMTZ-knowledge 
base and let n be its size. Every completion of Sj^ is finite and its size is 0(2'*"). 

Proof. Let 5 be a completion of Sj^. From Lemma 3.9 it follows that there are at most 2" 
non-blocked variables in S . Therefore there are at most m X 2" total variables in S , where 
m is the maximum number of direct successors for a variable in S . 

Observe that m is bounded by the number of 3R.C concepts (at most n) plus the sum of 
all numbers appearing in number restrictions. Since these numbers are expressed in binary, 
their sum is bounded by 2". Hence, m < 2" -|- ra. Since the number of individuals is also 
bounded by ra, the total number of objects in S is at most mx (2"-|-ra) < (2"-|-ra) X (2" +n), 
that is, 0(22'^). 
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The number of different constraints of the form s: C, \fx.x: C in which each object s can 
be involved is bounded by ra, and each constraint has size linear in n. Hence, the total size 
of these constraints is bounded by ra X ra X 2^", that is 0(2"^"). 

The number of constraints of the form sPt, s ^ t is bounded by (2^")^ = 2'*", and each 
constraint has constant size. 

In conclusion, we have that the size of S is 0(2'*"). □ 

Notice that the above one is just a coarse upper bound, obtained for theoretical purposes. 
In practical cases we expect the actual size to be much smaller than that. For example, 
if the numbers involved in number restrictions were either expressed in unary notation, or 
limited by a constant (the latter being a reasonable restriction in practical systems) then 
an argumentation analogous to the above one would lead to a bound of 2"^". 

Theorem 3.11 (Decidability) Given an ALCMTZ-knowledge base S, checking whether S 
is satisfiahle is a decidable problem. 

Proof. This follows from Theorems 3.7 and 3.10 and the fact that S is satisfiable if and 
only if is satisfiable. □ 

We can refine the above theorem, by giving tighter bounds on the time required to 
decide satisfiability. 

Theorem 3.12 (Time complexity) Given an ALCMTZ-knowledge base S, checking 
whether S is satisfiable can be done in nondeterministic exponential time. 

Proof. In order to prove the claim it is sufficient to show that each completion is obtained 
with an exponential number of applications of rules. Since the number of constraints of 
each completion is exponential (Theorem 3.10) and each rule, but the ^<-rule, adds new 
constraints to the constraint system, it follows that all such rules are applied at most an 
exponential number of times. Regarding the ^<-rule, it is applied for each object at most as 
many times as the number of its direct successors. Since such number is at most exponential 
(if numbers are coded in binary) w.r.t. the size of the knowledge base, the claim follows. □ 

A lower bound of the complexity of KB-satisfiability is obtained exploiting previous 
results about the language ACC, which is a sublanguage of ACCMTZ that does not include 
number restrictions and role conjunction. We know from McAUester (1991), and (indepen- 
dently) from an observation by Nutt (1992) that KB-satisfiability in yl£C-knowledge bases 
is EXPTIME-hard (see (Garey & Johnson, 1979, page 183) for a definition) and hence it 
is hard for yl£CA/'7^-knowledge bases, too. Hence, we do not expect to find any algorithm 
solving the problem in polynomial space, unless PSPACE=EXPTIME. Therefore, we do 
not expect to substantially improve space complexity of our calculus, which already works 
in exponential space. We now discuss possible improvements on time complexity. 

The proposed calculus works in nondeterministic exponential time, and hence improves 
the one we proposed in (Buchheit, Donini, & Schaerf, 1993, Sec. 4), which works in deter- 
ministic double exponential time. The key improvement is that we showed that a KB has 
a model if and only if it has a model of exponential size. However, it may be argued that 
as it is, the calculus cannot yet be turned into a practical procedure, since such a proce- 
dure would simply simulate nondeterminism by a second level of exponentiality, resulting 
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in a double exponential time procedure. However, the different combinations of concepts 
are only exponentially many (this is just the cardinality of the powerset of the set of con- 
cepts). Hence, a double exponential time procedure wastes most of the time re- analyzing 
over and over objects with different names yet with the same a(-, •), in different constraint 
systems. This could be avoided if we allow a variable to be blocked by a witness that is 
in a previously analyzed constraint system. This technique would be similar to the one 
used in (Pratt, 1978), and to the tree- automata technique used in (Vardi & Wolper, 1986), 
improving on simple tableaux methods for variants of propositional dynamic logics. Since 
our calculus considers only one constraint system at a time, a modification of the calculus 
would be necessary to accomplish this task in a formal way, which is outside the scope of 
this paper. The formal development of such a deterministic exponential time procedure will 
be a subject for future research. 

Notice that, since the domain of the canonical interpretation A^'^ is always finite, we 
have also implicitly proved that yl£CA/'7^-knowledge bases have the finite model property, 
i.e., any satisfiable knowledge base has a finite model. This property has been extensively 
studied in modal logics (Hughes & Cresswell, 1984) and dynamic logics (Harel, 1984). In 
particular, a technique, called filtration, has been developed both to prove the finite model 
property and to build a finite model for a satisfiable formula. This technique allows one to 
build a finite model from an infinite one by grouping the worlds of a structure in equivalence 
classes, based on the set of formulae that are satisfied in each world. It is interesting to 
observe that our calculus, based on witnesses, can be considered as a variant of the filtration 
technique where the equivalence classes are determined on the basis of our ^-equivalence 
relation. However, because of number restrictions, variables that are ^-equivalent cannot 
be grouped, since they might be separated (e.g., they might have been introduced by the 
same application of the ^>-rule). Nevertheless, they can have the same direct successors, 
as stated in point 4.(b) of the definition of canonical interpretation on page 124. This would 
correspond to grouping variables of an infinite model in such a way that separations are 
preserved. 

4. Relation to previous work 

In this section we discuss the relation of our paper to previous work about reasoning with in- 
clusions. In particular, we first consider previously proposed reasoning techniques that deal 
with inclusions and terminological cycles, then we discuss the relation between inclusions 
and terminological cycles. 

4.1 Reasoning Techniques 

As mentioned in the introduction, previous results were obtained by Baader et al. (1990), 
Baader (1990a, 1990b), Nebel (1990, 1991), Schild (1991) and Dionne et al. (1992, 1993). 

Nebel (1990, Chapter 5) considers the language TT, containing concept conjunction, 
universal quantification and number restrictions, and TBoxes containing (possibly cyclic) 
concept definitions, role definitions and disjointness axioms (stating that two concept names 
are disjoint). Nebel shows that subsumption of TjT-concepts w.r.t. a TBox is decidable. 
However, the argument he uses is non-constructive: He shows that it is sufficient to con- 
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sider finite interpretations of a size bounded by the size of the TBox in order to decide 
subsumption. 

In (Baader, 1990b) the effect of the three types of semantics — descriptive, greatest fix- 
point and least fixpoint semantics — for the language T Cq, containing concept conjunction 
and universal quantification, is described with the help of finite automata. Baader reduces 
subsumption of JT/^Q-concepts w.r.t. a TBox containing (possibly cyclic) definitions of the 
form A = C (which he calls terminological axioms) to decision problems for finite automata. 
In particular, he shows that subsumption w.r.t. descriptive semantics can be decided in poly- 
nomial space using Biichi automata. Using results from (Baader, 1990b), in (Nebel, 1991) 
a characterization of the above subsumption problem w.r.t. descriptive semantics is given 
with the help of deterministic automata (whereas Biichi automata are nondeterministic). 
This also yields a PSPACE-algorithm for deciding subsumption. 

In (Baader et al., 1990) the attention is restricted to the language ALC. In particular, 
that paper considers the problem of checking the satisfiability of a single equation of the 
form C = T, where C is an yl£C-concept. This problem, called the universal satisfiabil- 
ity problem, is shown to be equivalent to checking the satisfiability of an yl£C-TBox (see 
Proposition 4.1). 

In (Baader, 1990a), an extension oi ALC, called ALCreg^ is introduced, which supports 
a constructor to express the transitive closure of roles. By means of transitive closure of 
roles it is possible to replace cyclic inclusions of the form A \Z D with equivalent acyclic 
ones. The problem of checking the satisfiability of an yl^Cre^-concept is solved in that 
paper. It is also shown that using transitive closure it is possible to reduce satisfiability 
of an yl£C-concept w.r.t. an ACC-TBox T = {C'l C Di, . . .,Cn C _D„} into the concept 
satisfiability problem in ACCreg (w.r.t. the empty TBox). Since the problem of concept 
satisfiability w.r.t. a TBox is trivially harder than checking the satisfiability of a TBox, 
that paper extends the result given in (Baader et al., 1990). 

The technique exploited in (Baader et al., 1990) and (Baader, 1990a) is based on the 
notion of concept tree. A concept tree is generated starting from a concept C in order 
to check its satisfiability (or universal satisfiability). The way a concept tree is generated 
from a concept C is similar in fiavor to the way a complete constraint system is generated 
from the constraint system {x:C}. However, the extension of the concept tree method to 
deal with number restrictions and individuals in the knowledge base is neither obvious, nor 
suggested in the cited papers; on the other hand, the extension of the calculus based on 
constraint systems is immediate, provided that additional features have a counterpart in 
First Order Logic. 

In (Schild, 1991) some results more general than those in (Baader, 1990a) are obtained 
by considering languages more expressive than ACCreg and dealing with the concept satisfia- 
bility problem in such languages. The results are obtained by establishing a correspondence 
between concept languages and Propositional Dynamic Logics (PDL), and reducing the 
given problem to a satisfiability problem in PDL. Such an approach allows Schild to find 
several new results exploiting known results in the PDL framework. However, it cannot be 
used to deal with every concept language. In fact, the correspondence cannot be established 
when the language includes some concept constructors having no counterpart in PDL (e.g., 
number restrictions, or individuals in an ABox). 
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Recently, an algebraic approach to cycles has been proposed in (Dionne et al., 1992), in 
which (possibly cyclic) definitions are interpreted as determining an equivalence relation over 
the terms describing concepts. The existence and uniqueness of such an equivalence relation 
derives from Aczel's results on non-well founded sets. In (Dionne et al., 1993) the same 
researchers prove that subsumption based on this approach is equivalent to subsumption in 
greatest fixpoint semantics. The language analyzed is a small fragment of the one used in the 
TKRS K-REP, and contains conjunction and existential-universal quantifications combined 
into one construct (hence it is similar to TLq). The difficulty of extending these results 
lies in the fact that it is not clear how individuals can be interpreted in this algebraic 
setting. Moreover, we believe that constructive approaches like the algebraic one, give 
counterintuitive results when applied to non-constructive features of concept languages — as 
negation and number restrictions. 

In conclusion, all these approaches, i.e., reduction to automata problems, concept trees, 
reduction to PDL and algebraic semantics, deal only with TBoxes and they don't seem to be 
suitable to deal also with ABoxes. On the other hand, the constraint system technique, even 
though it was conceived for TBox-reasoning, can be easily extended to ABox-reasoning, as 
also shown in (HoUunder, 1990; Baader & HoUunder, 1991; Donini et al., 1993). 

4.2 Inclusions versus Concept Definitions 

Now we compare the expressive power of TBoxes defined as a set of inclusions (as done in 
this paper) and TBoxes defined as a set of (possibly cyclic) concept introductions of the 
form A < D and A = D. 

Unlike (Baader, 1990a) and (Schild, 1991), we consider reasoning problems dealing with 
TBox and ABox together. Moreover, we use the descriptive semantics for the concept intro- 
ductions, as we do for inclusions. The result we have obtained is that inclusion statements 
and concept introductions actually have the same expressive power. In detail, we show that 
the satisfiability of a knowledge base S = {A,T), where T is a set of inclusion statements, 
can be reduced to the satisfiability of a knowledge base S' = {A', T') such that T' is a set 
of concept introductions. The other direction, from concept introductions to inclusions, is 
trivial since introductions of the form A = D can be expressed by the pair of inclusions 
A \Z D and D \Z A, while a concept name specification A < D can be rewritten as the 
inclusion A \Z D (as already mentioned in Section 2). 

As a notation, given a TBox T = {C'l C -Di, . . . , C„ C Dn}, we define the concept Cq- 
as Cr = (-'Ci U Di) n • • • n (-.C„ U Dn). As pointed out in (Baader, 1990a) for AjCC, an 
interpretation satisfies a TBox T if and only if it satisfies the equation Cq- = T. This result 
easily extends to ACCMTZ, as stated in the following proposition. 

Proposition 4.1 Given an ALCMTZ-TBox T = {Ci C -Di, . . .,C„ C D.^}, an interpreta- 
tion I satisfies T if and only if it satisfies the equation Cq- = T . 

Proof. An interpretation I satisfies an inclusion C C -D if and only if it satisfies the equation 
-iC U _D = T; I satisfies the set of equations -iCi U _Di = T,. . . , -iC„ U = T if and only 
if I satisfies (-.Ci U Di) n • • • n (-.C„ U Dn) = T. The claim follows. □ 
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Given a knowledge base S = {A, T) and a concept A not appearing in S, we define the 
knowledge base S' = {A',T') as follows: 

A' = A U {A(b) I 5 is an individual in S} 

r' = {A < Cr nVPi.An •••nVP^.A} 

where Pi,P2,...,P„ are all the role names appearing in S. Note that T' has a single 
inclusion, which could be also thought of as one primitive concept specification. 

Theorem 4.2 S = {A,T) is satisfiable if and only ifT,' = {A',T') is satisfiable. 

Proof. In order to simplify the machinery of the proof, we will use for T' the following 
(logically equivalent) form: 

T' = {AQ C'r, AQyPi.A,...,AQ VP„.A} 

(Note that we use the symbol 'C' instead of '<' because now the concept name A appears 
as the left-hand side of many statements, we must consider these statements as inclusions). 

"=^" Suppose S = {A,T) satisfiable. From Theorem 3.7, there exists a complete 
constraint system S without clash, which defines a canonical interpretation I5 which is a 
model of S. Define the constraint system S' as follows: 

S' = S U {w. A I w is an object in S} 

and call I5/ the canonical interpretation associated to S'. We prove that I5/ is a model of 
S'. 

First observe that every assertion in A is satisfied by I5/ since I5/ is equal to I5 except 
for the interpretation of A, and A does not appear in A. Therefore, every assertion in A' 
is also satisfied by I5/, either because it is an assertion of A, or (if it is an assertion of the 
form A(b)) by definition of S'. 

Regarding T', note that by definition of 5", we have A^s' = /S.^s' = l^s-^ therefore both 
sides of the inclusions of the form A C MPi.A {i = 1, . . . , ra) are interpreted as A^s' , hence 
they are satisfied by I5/. Since A does not appear in Cr, we have that (Ct)^s' = (Ct)^'^. 
Moreover, since I5 satisfies T, we also have, by Proposition 4.1, that (Crfs = A^s , 
therefore {erf's' = {Crfs = /\ls = /\ls' . It follows that also both sides of the inclusion 
A C Ct are interpreted as A-^s'. In conclusion, 1$' satisfies T' . 

"<^" Suppose S' = {A',T') satisfiable. Again, because of Theorem 3.7, there exists a 
complete constraint system S' without clash, which defines a canonical interpretation I5/ 
which is a model of S'. We show that I5/ is also a model of S. 

First of all, the assertions in A are satisfied because A C A' , and I5/ satisfies every 
assertion in A' . To prove that I5/ satisfies T, we first prove the following equation: 

A^s' = A^s' (2) 

Equation 2 is proved by showing that, for every object s G A-^s' ^ 5 is in A^s' . In order to do 
that, observe a general property of constraint systems: Every variable in S' is a successor of 
an individual. This comes from the definition of the generating rules, which add variables 
to the constraint system only as direct successors of existing objects, and at the beginning 
Sy,' contains only individuals. 

Then, Equation 2 is proved by observing the following three facts: 
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1 . for every individual b in A^s' ^ 5 g A^s' ■ 

2. if an object s is in A^s' ^ then because I5/ satisfies the inclusions A^s' c (yPi.A)^s' , . . . , 
A^s' C (VP„.A)-^s'^ every direct successor of s is in A^s'; 

3. the successor relation is closed under the direct successor relation 

From the Fundamental Theorem on Induction (see e.g., Wand, 1980, page 41) we con- 
clude that every object s of A^s' is in A^s' . This proves that Equation 2 holds. 

From Equation 2, and the fact that I5/ satisfies the inclusion A^s' c (Ct)^s' ^ we derive 
that {CtYs' = A^s' , that is I5/ satisfies the equation Cq- = T. Hence, from Proposition 
4.1, 15/ satisfies T, and this completes the proof of the theorem. □ 

The machinery present in this proof is not new. In fact, realizing that the inclusions 
A C yPi.A, . . . ,A C yPn.A simulate a transitive closure on the roles Pi, . . . , P„, one can 
recognize similarities with the proofs given by Schild (1991) and Baader (1990a). The differ- 
ence is that their proofs rely on the notion of connected model (Baader uses the equivalent 
notion of rooted model). In contrast, the models we obtain are not connected, when the 
individuals in the knowledge base are not. What we exploit is the weaker property that 
every variable in the model is a successor of an individual. 

Note that the above reduction strongly relies on the fact that disjunction 'U' and com- 
plement '-1' are within the language. In fact, disjunction and complement are necessary 
in order to express all the inclusions of a TBox T inside the concept Cq-. Therefore, the 
proof holds for yl£C-knowledge bases, but does not hold for TKRSs not allowing for these 
constructors of concepts (e.g., back). 

Furthermore, for the language J-Cq introduced in Section 4.1, the opposite result holds. 
In fact, McAUester (1991) proves that computing subsumption w.r.t. a set of inclusions is 
EXPTIME-hard, even in the small language J-Cq. Conversely, Nebel (1991) proves that 
subsumption w.r.t. a set of cyclic definitions in J-Cq can be done in PSPACE. Combining 
the two results, we can conclude that for J-Cq subsumption w.r.t. a set of inclusions and 
subsumption w.r.t. a set of definitions are in different complexity classes, hence (assuming 
EXPTIME 7^ PSPACE) inclusion statements are strictly more expressive than concept 
definitions in J-Cq. 

It is still open whether inclusions and definitions are equivalent in languages whose 
expressivity is between J-Cq and ACC. 

5. Discussion 

In this paper we have proved the decidability of the main inference services of a TKRS based 
on the concept language ACCMTZ. We believe that this result is not only of theoretical 
importance, but bears some impact on existing TKRSs, because a complete procedure can 
be easily devised from the calculus provided in Section 3. From this procedure, one can build 
more efficient (but still complete) ones, as described at the end of Section 3.2, and also by 
applying standard optimization techniques such as those described in (Baader, HoUunder, 
Nebel, Profitlich, & Franconi, 1992). An optimized procedure can perform well for small 
sublanguages where reasoning is tractable, while still being complete when solving more 
complex tasks. However, such a complete procedure will still take exponential time and 
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space in the worst case, and it may be argued what could be its practical applicability. We 
comment in following on this point. 

Firstly, a complete procedure (possibly optimized) offers a benchmark for comparing 
incomplete procedures, not only in terms of performance, but also in terms of missed infer- 
ences. Let us illustrate this point in detail, by providing a blatant paradox: consider the 
mostly incomplete constant-time procedure, answering always "No" to any check. Obvi- 
ously this useless procedure outperforms any other one, if missed inferences are not taken 
into account. This paradox shows that incomplete procedures can be meaningfully com- 
pared only if missed inferences are considered. But to recognize missed inferences over large 
examples, one needs exactly a complete procedure — even if not an efficient one — like ours. 
We believe that a fair detection of missed inferences would be of great help even when the 
satisfaction of end users is the primary criterion for judging incomplete procedures. 

Secondly, a complete procedure can be used for "anytime classification", as proposed 
in (MacGregor, 1992). The idea is to use a fast, but incomplete algorithm as a first step 
in analyzing the input knowledge, and then do more reasoning in background. In the 
cited paper, resolution-based theorem provers are proposed for performing this background 
reasoning. We argue that any specialized complete procedure will perform better than a 
general theorem prover. For instance, theorem provers are usually not specifically designed 
to deal with filtration techniques. 

Moreover, our calculus can be easily adapted to deal with rules. As outlined in the 
introduction, rules are often used in practical TKRSs. Rules behave like one-way concept 
inclusions — no contrapositive is allowed — and they are applied only to known individuals. 
Our result shows that rules in ALCMTZ can be applied also to unknown individuals (our 
variables in a constraint system) without endangering decidability. This result is to be 
compared with the negative result in (Baader & HoUunder, 1992), where it is shown that 
subsumption becomes undecidable if rules are applied to unknown individuals in CLASSIC. 

Finally, the calculus provides a new way of building incomplete procedures, by modifying 
some of the propagation rules. Since the rules build up a model, modifications to them 
have a semantical counterpart which gives a precise account of the incomplete procedures 
obtained. For example, one could limit the size of the canonical model by a polynomial in 
the size of the KB. Semantically, this would mean to consider only "small" models, which 
is reasonable when the intended models for the KB are not much bigger than the size of the 
KB itself. We believe that this way of designing incomplete procedures "from above", i.e., 
starting with the complete set of inferences and weakening it, is dual to the way incomplete 
procedures have been realized so far "from below", i.e., starting with already incomplete 
inferences and adding inference power by need. 

Further research is still needed to address problems issuing from practical systems. For 
example, to completely express role restrictions inside number restrictions, qualified number 
restrictions (HoUunder & Baader, 1991) should be taken into account. Also, the language 
resulting from the addition of enumerated sets (called ONE-OF in classic), and role fillers 
to ALCMTZ is still to be studied, although it does not seem to endanger the filtration 
method we used. Instead, a different method might be necessary if inverse roles are added 
to ALCMTZ, since the finite model property is lost (as shown in Schild, 1991). Finally, the 
addition of concrete domains (Baader & Hanschke, 1991) remains open. 
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